Scenario Design for Spoken Language Dialogue Systems Development

نویسندگان

  • Laila Dybkjær
  • Niels Ole Bernsen
  • Hans Dybkjær
چکیده

Adequate data acquired through the Wizard of Oz experimental prototyping method are still crucial to the cost-effective development of advanced spoken language dialogue systems. One important source of data corruption is the unintended priming of subjects through the task scenario representations used in the experiments. The paper presents the three sets of development and test scenario representations which were used in the Danish Dialogue project. Based on the third set of scenarios an experiment was conducted to investigate the effects of a masking strategy which effectively avoids the possibility of priming the WOZ subjects. The experimental results are presented and discussed. 1. THE ROLE OF SCENARIOS IN SPOKEN LANGUAGE DIALOGUE SYSTEMS DESIGN Scenarios are important tools in spoken language dialogue systems (SLDSs) development and testing. Nonetheless, the SLDS literature has little to say about scenario design and on the many problems to be aware of. This paper presents conclusions from the Danish Dialogue project as regards the construction, representation and use of scenarios in SLDS design. Over the last three years, the authors have designed and implemented the dialogue part of a realistic SLDS prototype, P2, which has been developed in collaboration with the Center for PersonKommunikation at Aalborg University and the Centre for Language Technology in Copenhagen. The domain of P2 is Danish domestic airline ticket reservation. The P2 dialogue model was developed by means of the Wizard of Oz (WOZ) experimental prototyping method [3, 5, 6]. WOZ is an iterative process of testing and revising the dialogue model, which continues until the model is found acceptable for implementation. The implemented dialogue model is subjected to further testing. Each of these tests requires the use of predefined scenarios. The purpose of using scenarios is to develop and test the dialogue model on the basis of realistic situations of use of the SLDS under construction. Scenarios prescribe tasks embedded in realistic situations of use, which subjects, i.e. the persons acting as users, are asked to perform through spoken dialogue with the system. The scenario-based dialogues provide crucial data on user-system behaviour during dialogue, i.e. on user reactions to various aspects of the system’s behaviour and vice versa, as well as on users’ sublanguage vocabulary, utterance length, dialogue act types, number of turns per scenario, grammatical complexity, utterance ungrammaticality, task ordering preferences, problem-solving strategies, etc. An additional aim in using scenarios is to achieve some amount of systematicity in the testing process. There is, however, no known method for designing scenarios which are representative of all possible situations of use of the artefact being designed [7]. So the basic problem in scenario design is to capture, in a limited set of scenarios, as much as possible of the space of possible situations of use. 2. THE P2 DEVELOPMENT AND TEST SCENARIOS Seven WOZ iterations were performed to design the P2 dialogue model which was then implemented and tested. Three different sets of scenarios were constructed in the process: one set for the first four WOZ iterations, a second set for the following three iterations, and a third set for the prototype user test. The first set of scenarios was relatively small, comprising ten scenarios which were not designed to systematically represent as many situations of use as possible. The scenarios were simply considered as a set of cases for which the system should work and were mainly used for domain and task exploration and training by the two system designers acting as wizard and subject, respectively. The subject often revised a scenario and sometimes invented a new scenario on the fly which was never written down. The second set of scenarios was designed on the basis of the dialogue structure that emerged from the fourth WOZ iteration. By then the scenarios could be designed in a more systematic way, as most of the domain and task structure had been uncovered. The first two sets of scenarios conform to the notion of development scenarios, i.e. scenarios which are intended to more or less systematically cover the intended system functionality and are normally designed by the system designers [2]. Our third set of scenarios rather correspond to the notion of evaluation and test scenarios [2]. Based on the WOZ scenario experiences, we carefully considered what to test and why. We decided, i.a., not to do user testing on a number of possible but unlikely cases of communication failure. These have instead been tested in the black-box test. Since the flight ticket reservation task is a wellstructured task in which a prescribed amount of information must be exchanged between user and system [1, 4], it was possible to extract from the dialogue structure a set of sub-task components, such as number of travellers, age of traveller, and discount or normal fare, any combination of which should be handled by P2. The scenarios were generated through systematically combining these components. 3. MASKING THE SCENARIO REPRESENTATIONS A scenario representation represents a task which subjects have to perform through dialogue with the system. A central problem addressed in our design of the test scenarios was the following. The sub-language vocabulary of P2 had been derived from the scenariobased WOZ dialogues. During the later WOZ experiments we discovered that subjects tended to repeat the date and hour of departure expressions used in the scenarios. This is a problem because a vocabulary defined on the basis of dialogues in which users model scenario phrases may not be sufficiently representative of realistic language use. On the other hand, scenarios clearly have to describe, to some necessary extent, the tasks to be performed by the subjects. It is not obvious, therefore, how one can avoid providing subjects with words or phrases which they will tend to repeat when answering the system’s questions, rather than selecting their own forms of expression. We decided to investigate how to make it impossible for subjects to model the test scenario representations in unintended ways. We therefore had to consider which information to mask, and how. For each sub-task in the dialogue structure the type of question posed by the system was categorised. There were four types of question. One type invited a yes/no answer. A second type invited an answer containing an element chosen from an explicit list of alternatives, i.e. a multiple choice question. The third type invited the user to state a proper name or something similar to a proper name, such as an airport name or the user’s own customer number. The fourth type were open questions about some topic, such as the date of departure. The interesting point is that in the first three cases, the key information can only be co-operatively expressed in one of several closely related ways, which means that it does not matter if users model the expressions of the scenario representation. It is only in the fourth case that cooperative user answers may express the key information in many different ways. It is exactly in these cases that it is desirable to know how users would normally express themselves and hence mandatory to prevent them from modelling the scenario representations. Questions of this type all concerned date and hour of departure. We therefore decided to concentrate on masking the scenario representations as regards date and hour of departure in order to avoid priming of the subjects. In general, dates are either expressed in relative terms as being relative to, e.g., today, or in absolute terms as calendar dates. Hours are either expressed in quantitative terms, such as, e.g., ‘ten fifteen a.m.’ or ‘between ten and twelve’, or in qualitative terms, such as ‘in the morning’ or ‘before the rush hour’. The masked scenario representations never contained reusable expressions referring to dates or hours of departure. Relative dates were expressed using a list of the days from today onwards. Absolute dates were expressed as calendar indices such as might be used by a customer when booking a flight. Quantitative hours were expressed using the face of a clock. Qualitative hours were expressed using (travel) goal state temporal expressions rather than departure state temporal expressions, e.g. ‘they want to arrive early in the evening’. This means that the user (subject), in order to determine when it would be desirable to depart, had to make an inference from the hour indicated in the scenario representation, thus excluding the possibility of priming.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Challenges for Spoken Dialogue Systems

The past decade has seen the development of a large number of spoken dialogue systems around the world, both as research prototypes and commercial applications. These systems allow users to interact with a machine to retrieve information, conduct transactions, or perform other problem-solving tasks. In this paper we discuss some of the design issues which confront developers of spoken dialogue ...

متن کامل

Different Spoken Language Dialogues for Different Tasks A Task-Oriented Dialogue Theory

Spoken language dialogue is a comfortable form of communication between humans and computers, which is present in a growing number of commercial systems. For each task which can be comfortably performed in spoken language dialogue with the computer, there is an equivalence class of tasks which can be performed using similar dialogue management technology. Each such task class has a number of mi...

متن کامل

Bootstrapping spoken dialogue systems by exploiting reusable libraries

Building natural language spoken dialogue systems requires large amounts of human transcribed and labeled speech utterances to reach useful operational service performances. Furthermore, the design of such complex systems consists of several manual steps. The User Experience (UE) expert analyzes and defines by hand the system core functionalities: the system semantic scope (call-types) and the ...

متن کامل

On the Use of Context in Building Spoken Language Dialogue Systems for Large Tasks

Context is of crucial importance to language understanding in general and plays a central role in spoken language dialogue systems design. Context, however, is hard to define. In this paper context is viewed as denoting a collection of aspects or contextual elements each of which may be defined and analysed with respect to its specific contribution to dialogue understanding. Massive exploitatio...

متن کامل

From single word to natural dialogue

Spoken language dialogue systems represent the peak of achievement in speech technologies in the 20 th century and appear set to form the basis for the increasingly natural interactive systems to follow in the coming decades. This chapter first presents a model of the task-oriented spoken dialogue system, its multiple aspects and some of the remaining research challenges. In the context of this...

متن کامل

On-Line Learning of a Persian Spoken Dialogue System Using Real Training Data

The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002